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SUMMARY 


Given  a  sparse  nonsquare  system  of  linear  equations  Mx  =  b  where  MTM  is 
either  dense  or  full,  we  present  a  direct  method  that  generates  a  least  squares 
solution  of  the  original  system  Mx  by  solving  a  smaller  least  squares  problem. 
The  method  accomplishes  th  ecomposition  by  applying  orthogonal 
transformations  to  a  restructured  ~.<n  of  the  original  system  of  equations.  The 
algorithms  derived  from  the  decomposition  result  are  well-suited  for  both 
sequential  and  parallel  architecture  machines.  In  a  specific  Navy  signal  processing 
application,  the  presented  algorithm  computed  on  a  Sun  SPARC  10  workstation  a 
least  squares  solution  of  a  rank  deficient  system  comprising  703  equations  and 
592  variables  in  a  number  of  floating  point  computations  tenfold  smaller  than  a 
method  that  does  not  exploit  the  sparsity  structure  of  M. 
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1.0  INTRODUCTION 


The  method  of  least  squares  is  a  fundamental  computational  tool  for 
estimating  unknown  parameters,  curve  fitting,  data  smoothing  as  well  as  solving 
nonsquare  systems  of  equations.  When  the  problem  is  formulated  as  a  nonsquare 
system  of  linear  equations 

Mx  =  b,  (1) 

in  which  M  is  an  m  by  n  full  rank  or  rank  deficient  matrix,  the  linear  least  squares 
problem  is  expressed  in  the  standard  form 


min  ||  Mx  -  b  ||2 .  (2) 

x 

A  number  of  efficient,  direct  methods  for  solving  large  and  sparse  linear  least 
squares  problems  have  been  developed  using  orthogonalization,  elimination  and 
augmentation  (George  &  Heath,  1980;  Golub,  1965;  Hachtel,  1974;  Heath,  1982; 
Heath,  1984;  Liu,  1986;  Peters  &  Wilkinson,  1970).  A  central  assumption  in  all 
these  methods  is  that  the  matrix  MTM  is  sparse.  In  this  work,  we  expand  the  choice 
of  methods  by  developing  a  method  that  efficiently  handles  the  case  in  which  M  is  a 
sparse  matrix  while  MTM  is  dense,  and  possibly  full.  Our  method  is  based  on  a 
decomposition  result  obtained  from  the  application  of  orthogonal  transformations  to 
a  restructured  form  of  the  system  Mx  =  b.  With  this  result,  the  original  sparse  linear 
least  squares  problem  is  reduced  to  a  smaller  least  squares  problem.  The  size  of 
this  smaller  problem  depends  on  the  original  zero-nonzero  structure  of  the  sparse 
matrix  M.  For  a  specific  Navy  signal  processing  application  in  which  M  is  a  703  by 
592  sparse  and  rank  deficient  matrix  and  MTM  is  full,  the  smaller  problem  consisted 
of  a  dense,  rank  deficient  system  comprising  231  equations  and  1 20  variables.  The 
application  of  our  direct  method  to  the  smaller  problem  produced  a  solution  to  the 
original  sparse  linear  least  squares  problem  in  a  number  of  floating  point 
computations  tenfold  smaller  than  a  method  that  did  not  exploit  the  sparsity 
structure  of  M. 

The  algorithms  derived  from  our  decomposition  method  are  well-suited  for  both 
sequential  and  parallel  architecture  machines.  This  work  covers  results  obtained 
from  implementations  on  a  sequential  machine,  whereas  future  work  will  cover 
results  obtained  from  implementations  on  parallel  architecture  machines. 

This  report  is  organized  as  follows:  Section  2  reviews  some  of  the  most 
common  methods  for  solving  the  sparse  linear  least  squares  problem.  These 
include  the  normal  equations  method,  sequential  row  orthogonalization  method 
(George  &  Heath,  1980)  and  multifrontal  QR  factorization  (Liu,  1986).  Section  3 
presents  the  main  decomposition  result;  Section  4  discusses  various  sparsity 
structures  suitable  for  the  proposed  decomposition  result  including  the  block 
bordered  triangular  form;  Section  5  gives  two  high-level  implementations  of  the 
decomposition  result;  Section  6  reviews  four  methods  available  in  the  linear 
algebra  package  Matlab  (Mathworks,  1990)  for  solving  dense  and  sparse  linear 
least  squares  problems;  Section  7  presents  a  procedure  to  structure  the 
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overdetermined  matrix  arising  from  the  signal  processing  application  into  the  block 
bordered  triangular  form,  and  discusses  the  key  computational  issues  in 
permutating  an  arbitrary  nonsqaure  matrix  into  block  bordered  triangular  form; 
Section  8  outlines  some  of  the  parallel  features  inherent  in  our  algorithms,  and 
concludes  with  numerical  results  and  comparisons. 


2.0  BACKGROUND  AND  MOTIVATION 

The  system  of  normal  equations 


MTMx  =  MTb  (3) 

plays  a  pivotal  role  in  many  methods  developed  for  the  solution  of  the  linear  least 
squares  problem.  If  M  has  full  column  rank,  then  the  solution  of  the  linear  least 
squares  problem  (2)  is  given  by  the  solution  of  the  system  of  normal  equations  (3). 
This  leads  to  the  following  three  important  contributions  (George  &  Heath,  1980; 
Heath,  1984;  Liu,  1986)  for  solving  the  sparse  linear  least  squares  problem. 

1)  Normal  Equations  Method.  For  any  n  by  n  permutation  matrix  P,  PTMTMP  is 
symmetric  positive  definite  since  matrix  M  has  full  column  rank.  As  a  result,  the 
normal  equations  method  leads  to  the  following  algorithm  (Heath,  1984)  for  the 
solution  of  the  sparse  linear  least  squares  problem. 

procedure  normal_equations: 

begin 

determine  symbolic  structure  of  MTM; 

find  permutation  matrix  P  so  that  PTMTMP  has  sparse  upper  Cholesky  factor ; 
factor  PTMTMP  symbolically; 

comment  this  generates  a  row-oriented  data  structure  for  Cholesky  factor; 

compute  MtM  and  MTb  numerically; 

perform  Cholesky  factorization  PTMTMP  =  RTR; 

solve  RTz  =  PTMTb,  Ry  =  z  and  x  =  Py  in  that  order 

end 

Although  the  normal  equations  method  is  appealing  for  its  simple  formulation,  it 
may  give  rise  to  serious  loss  of  information  in  the  explicit  computation  of  the 
products  MtM  and  MTb.  Also,  the  condition  number  of  MTM  is  the  square  of  the 
condition  number  of  M,  and  so  an  accurate  solution  of  the  system  (1)  may  be 
extremely  difficult  to  compute  if  M  is  ill-conditioned.  To  avoid  these  potential 
numerical  instabilities  associated  with  the  normal  equations  method,  George  and 
Heath  (1980)  combined  orthogonalization  with  the  normal  equations  method  to 
arrive  at  the  following  important  method  for  solving  the  sparse  linear  least  squares 
problem. 

2)  Sequential  Row  Orthogonalization  Method.  Let  Q  be  an  m  by  m  orthogonal 
matrix  and  let  P  be  any  n  by  n  permutation  matrix  so  that 
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(•») 


QTMP  -  [o] , 

where  R  is  an  n  by  n  upper  triangular  matrix  and  0  is  m-n  by  n  zero  matrix. 
Combining  (1)  and  (4),  and  partitioning  QTb  conformably  into  the  direct  sum  of  f 
and  h,  gives  rise  to  the  upper  triangular  system 


Ry  =  f  (5) 

in  which  y  ■  PTx.  The  vector  x  =  Py  obtained  subsequent  to  the  solution  of  (5)  is  a 
solution  to  the  linear  least  squares  problem  (2)  since  the  2-norm  is  preserved 
under  orthogonal  transformations  (Golub,  1965).  The  matrix  Q  is  usually  obtained 
using  either  Householder  reflections  or  Givens  rotations,  or  by  Gram-Schmidt 
orthogonalization  (Golub  &  Van  Loan,  1989;  Lawson  &  Hanson,  1974). 

While  orthogonalization  eliminates  the  potential  numerical  instabilities 
encountered  in  the  normal  equations  method,  the  application  of  orthogonal 
transformations  to  the  matrix  MP  may  cause  severe  fill-in.  George  &  Heath  (1980), 
however,  noted  the  following  identity 

PtMtMP=  RtR,  (6) 

which  clearly  shows  that  the  upper  triangular  matrix  R  obtained  from  the  application 
of  orthogonal  transformations  to  the  matrix  MP  is  the  upper  Cholesky  factor  of  the 
symmetric  positive  definite  matrix  PTMTMP.  This  observation  led  George  and  Heath 
to  an  algorithm  which  combines  the  attractive  features  in  the  normal  equations 
method  and  orthogonalization  into  the  following  algorithm. 

procedure  sequential_row_orthogonalization: 

begin 

apply  the  first  three  steps  in  procedure  normal_equations; 

compute  R  and  f  by  applying  Givens  rotations  to  rows  of  [MP  b]  one  at  a  time; 

solve  Ry  *  f  and  x  =  Py  in  that  order 

end 

The  most  significant  contribution  of  the  sequential  row  orthogonalization 
method  is  highlighted  in  step  2  of  the  above  procedure  where  orthogonal 
transformations  are  carried  out  using  fixed  (static)  data  structure  for  R  (George  & 
Heath,  1980).  This  feature  makes  the  sequential  row  orthogonalization  method 
extremely  efficient  for  sparse  linear  least  squares  problems  when  MTM  is  assumed 
to  be  sparse. 

3)  Multifrontal  QR  Factorization  Method.  This  method,  originally  called  row 
merging  scheme  (Liu,  1986),  is  a  means  to  compute  the  upper  triangular  matrix  R 
in  the  QR  factorization  (4)  by  applying  Householder  reflections  to  small  dense 
submatrices  in  such  a  way  that  each  submatrix  is  factorized  independently  of  the 
others.  The  allocation  of  these  small  submatrices  (called  frontal  matrices)  and  the 
subsequent  treatment  at  the  completion  of  their  factorizations  is  accomplished  by 
exploiting  the  sparsity  structure  of  the  symmetric  matrix  R+RT. 
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In  what  follows,  we  highlight  the  main  motivation  that  has  led  to  the  formulation 
of  the  multifrontal  QR  factorization  method.  For  any  i  with  Is  i  <,  n,  let  r,  and  Cj  be  two 
arrays  so  that  M(n,  q)  is  the  submatrix  of  M  consisting  of  the  rows  containing  the 
nonzeros  in  the  ith  column  of  M,  and  the  columns  containing  at  least  one  nonzero 
in  the  submatrix  M(n,  :).  Then,  the  first  row  obtained  from  the  QR  factorization  of  the 
submatrix  M(n,  Cj)  is  the  first  row  of  matrix  R  in  (4)  for  the  case  where  P  =  I.  Now  let 
M(:,  j)  be  any  other  column  of  M  such  that 


M(:,i)TM(:,j)  =  0.  (?) 

Assume  that  no  cancellation  of  nonzero  elements  takes  place  in  (7),  Then  no  row  of 
M  contains  nonzero  entries  in  both  M(:,  i)  and  M(:,  j),  and  so  the  first  row  obtained 
from  the  QR  factorization  of  the  submatrix  M(rj,  cj)  is  also  a  row  of  the  matrix  R  in  (4) 
for  the  case  where  P  =  I.  Consequently,  for  the  case  where  P  =  I  the  first  two  rows  of 
R  can  be  obtained  by  independent  QR  factorizations  of  the  submatrices  M(n,  Cj)  and 
M(rj,  Cj).  Such  submatrices  are  called  frontal  matrices,  and  the  method  derived  from 
the  use  of  frontal  matrices  takes  the  following  algorithmic  form. 

procedure  mu!tifrontal_QR_factorization: 

begin 

apply  the  first  two  steps  in  procedure  normal_equations; 

set  M  to  MP; 

while  M  has  any  column  do 
begin 

determine  frontal  matrices  in  M; 

for  each  frontal  matrix  M  in  M  do 

begin 

use  Householder  reflections  to  compute  QR  factorization  QTM  ; 

comment  first  row  of  QTM  is  a  row  of  R  in  (4); 

replace  submatrix  M  of  M  by  factorized  frontal  matrix  QTM 

end; 

delete  from  M  first  row  and  column  of  each  factorized  frontal  matrix 

end 

end 

By  equality  (7),  each  pass  of  the  for  loop  in  the  above  procedure  can  be  done 
independently  of  the  other  passes.  This  feature  makes  the  multifrontal  QR 
factorization  method  well-suited  for  parallel  computation.  However,  the  overall 
effectiveness  of  the  method  rests  on  how  efficiently  the  frontal  matrices  in  M  are 
allocated  and  formed  at  each  pass  of  the  while  loop.  This  subject  is  covered  next 
using  a  graph-theoretic  setting.  Our  definitions  are  from  (Tarjan,  1983). 

Let  G  *  (V,  E)  be  the  directed  graph  of  the  upper  Cholesky  factor  R.  Then  G  is  an 
acyclic  graph  since  R  is  an  upper  triangular  matrix,  and  so  there  exists  in  G  at  least 
one  vertex  v  such  that  no  edge  in  G  enters  v.  A  vertex  such  as  v  is  called  a  root.  Let 
X  denote  the  set  of  roots  in  G.  We  will  now  show  that  each  root  in  G  leads  to  a 
distinct  frontal  matrix.  These  are  the  frontal  matrices  (|X|  in  total)  that  are 
determined  at  the  first  line  of  the  while  loop,  and  factorized  at  the  first  pass  of  the 
for  loop. 
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Let  G'  =  (V,  E’)  be  the  undirected  graph  of  the  symmetric  matrix  R  +  RT.  We  want 
to  show  that  X  is  an  independent  set  (no  two  vertices  are  adjacent)  in  G'.  Assume 
for  contradiction  that  there  exists  in  X  a  pair  of  vertices  v  and  w  so  that  v  is  adjacent 
to  w  in  G'.  Then  (v,  w)  is  an  edge  in  G\  and  so  by  the  construction  of  G  and  G'  it 
foilows  that  there  exists  in  G  a  directed  edge  that  either  leaves  v  and  enters  w  or 
leaves  w  and  enters  v.  In  each  case,  we  have  established  a  contradiction  since 
both  v  and  w  are  roots  in  G.  Consequently,  X  is  an  independent  set  in  G\  which 
means  that  X  is  an  independent  set  in  the  undirected  graph  of  PTMTMP  since  this 
graph  is  a  subgraph  of  G*.  But  if  X  is  an  independent  set  in  the  undirected  graph  of 
PtMtMP,  then  each  pair  of  columns  in  MP  corresponding  to  a  pair  of  vertices  in  X 
satisfy  relation  (7).  Hence,  each  root  in  G  leads  to  a  frontal  matrix  in  MP. 

Let  M  be  the  m  -  |X|  by  n  -  |X|  matrix  obtained  at  the  completion  of  the  last  step  in 
the  while  loop.  By  the  property  that  the  first  row  of  each  factorized  frontal  matrix  in 
the  first  pass  of  the  for  loop  is  a  row  of  R  in  (4),  the  second  pass  of  the  for  loop  will 
produce  the  next  set  of  rows  of  R  in  (4).  This  process  is  repeated  until  all  n  columns 
of  M  have  been  deleted,  in  which  case  the  while  loop  terminates  and  the 
computation  of  the  upper  triangular  matrix  R  in  (4)  is  completed. 

In  what  follows,  we  cover  some  of  the  graph-theoretic  details  needed  to  form  the 
frontal  matrices  required  at  the  start  of  the  second  through  last  pass  of  the  for  loop, 
and  also  to  connect  our  interpretation  with  the  one  that  is  usually  used  to  describe 
the  multifrontal  QR  factorization  method. 

Let  us  set  Xo  to  X,  and  define  Xi  as  the  set  of  roots  in  the  induced  subgraph 
G(V-Xo).  Since  no  vertex  in  Xi  is  in  the  initial  set  of  roots  Xp,  no  vertex  in  Xi  is  a 
root  in  G,  which  means  that  for  every  root  x  in  G(V-Xo),  there  is  at  least  one  edge  in 
G  which  leaves  a  root  in  G  and  enters  x.  We  will  call  the  vertex  v  in  Xo  a  parent  of 
the  vertex  x  in  Xi.  With  this  notation  in  hand,  it  is  not  hard  to  show  that  the  frontal 
matrix  associated  with  a  root  x  in  G(V-Xo)  is  derived  from  the  frontal  matrices 
associated  with  the  parents  of  x  in  Xo.  It  is  important  to  note,  however,  that  no  two 
distinct  vertices  x  and  y  in  Xi  have  a  common  parent  in  Xo-  Otherwise,  the  pair  (x,  y) 
will  be  an  edge  in  the  induced  subgraph  G(V-Xo)  which  contradicts  the  assumption 
that  x  and  y  are  roots  in  G(V-Xo). 

We  complete  the  description  of  the  multifrontal  QR  factorization  method  with  the 
following  algorithm. 
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procedure  roots: 

begin 

compute  the  set  of  roots  Xo  in  G  ; 

V«-V-Xo; 

i  <-  0 ; 

while  V  is  not  empty  do 
begin 
i  < —  i  +  1  j 

compute  the  set  of  roots  Xj  in  G(V) ; 

for  each  x  in  Xj  do  compute  the  set  of  parents  of  x  in  Xj  .i ; 

V*-V-X i 

end 

end 

Let  h  denote  the  value  of  the  integer  i  at  the  completion  of  procedure  roots. 

Then  Xo,  Xi . Xh  are  the  sets  of  roots  computed  at  the  completion  of  procedure 

roots.  Note  that  all  parents  of  a  root  in  any  Xj  are  in  the  set  of  roots  Xj  .j  for  all  i  >  0. 
This  means  that  any  edge  that  has  its  end  points  in  two  non-consecutive  sets  of 
roots  is  redundant  as  far  as  the  computation  of  a  frontal  matrix  is  concerned.  Let 
G*  ■  (V,  E*)  be  the  graph  obtained  from  G  by  deleting  all  such  edges.  Then  the 
undirected  version  of  G*  obtained  from  G  by  replacing  each  directed  edge  by  an 
undirected  edge  is  the  elimination  tree  of  the  undirected  graph  G',  and  h  is  the 
height  of  the  elimination  tree.  An  elimination  tree  is  usually  the  means  for 
describing  the  multifrontal  QR  factorization  method  (Liu,  1986).  However,  this 
approach  requires  several  constructs  which  were  not  needed  in  our  interpretation. 

The  problem  of  finding  a  permutation  matrix  P  such  that  PTMTMP  has  sparse 
upper  Cholesky  factor  is  crucial  in  each  of  the  three  mehods  covered  herein. 
Therefore,  the  assumption  that  MTM  is  a  sparse  matrix  is  critically  important  since 
these  methods  may  behave  very  poorly  for  the  case  where  the  matrix  MTM  is 
dense. 

In  this  work,  we  expand  the  choice  of  methods  for  solving  the  sparse  linear  least 
squares  problem  by  developing  a  method  that  efficiently  handles  the  case  where  M 
is  a  sparse  matrix  while  MTM  is  dense,  and  possibly  full.  The  need  to  deal  with  this 
case  was  motivated  by  an  important  Navy  signal  processing  application  in  which  M 
is  a  highly  sparse  large  overdetermined  matrix  and  MTM  is  completely  full. 


3.0  A  DECOMPOSITION  BASED  ON  THE  SPARSITY  OF  M 

Let  P  and  S  be  permutation  matrices  such  that  PMS  is  a  2  by  2  block  matrix 


PMS 


(8) 


in  which  the  leading  block  A  is  an  N  by  N  square  and  nonsingular  matrix. 
Replacing  M  by  PMS  in  the  original  system  of  equations  Mx  =  b,  we  obtain  the 
following  equivalent  nonsquare  system 
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(PMS)(Stx)  =  (Pb).  (9) 

Consider  this  system  where  PMS  has  the  2  by  2  block  form  in  (8)  and  the 
vectors  STx  and  Pb  have  been  partitioned  conformably  into  the  direct  sums  of  y 
and  z,  and  u  and  v  respectively.  The  system  (9)  then  becomes 

[SSHll-B] 

Now  let  K  be  the  2  by  1  block  matrix  defined  by 

K  =  [c]-  ("> 

Since  block  A  is  nonsingular,  matrix  K  has  full  column  rank  and  so  there  exists  an 
orthogonal  matrix  Q  such  that  QTK  has  the  following  2  by  1  form 

0TK  =  [o].  (12) 


in  which  R  is  an  N  by  N  upper  triangular  matrix  with  nonzero  diagonal  entries  and  0 
is  a  zero  matrix.  Define  now  the  following  identity 


(13) 


where  X,  A,  f  and  h  have  the  same  dimensions  as  B,  D,  u  and  v  respectively.  Then, 
by  (9)  through  (13),  it  follows  that  the  following  two  nonsquare  systems  of  equations 

(QtPMS)(Stx)  =  (QTPb) 
and 

raw 

are  equivalent. 

Since  M  is  a  nonsquare  matrix  and  R  is  a  square  matrix,  it  follows  that  the  block 
A  in  (15)  is  a  nonsquare  matrix,  which  means  that  the  system  of  equations  Az  =  h  is 
nonsquare.  With  this  property  of  block  A,  we  are  in  position  to  state  the  main  result. 
To  facilitate  the  presentation  of  the  result,  we  will  designate  the  solution  to  problem 
(2)  as  a  least  squares  solution  of  system  (1). 


(14) 

(15) 


7 


Theorem  1 .  Suppose  the  m  -  N  vector  z*  is  a  least  squares  solution  of  the 
nonsquare  system 

Az  »  h. 

Then  the  n  vector  x*  defined  by 


x**S 


‘FT1*  f  -  Xz* 
z* 


’]• 


is  a  least  squares  solution  of  the  nonsquare  system  of  equations  Mx  =  b. 

Proof.  Assume  for  contradiction  that  the  n  vector  x *  is  not  a  least  squares 
solution  of  the  system  Mx  =  b.  Then  there  exists  another  n  vector  5  such  that 

II  M4  -  b  |12  <  ||  Mx*  -  b  ||2  , 

and  so  we  get 

II  (QtPMS)(St5)  -  (QTPb)  ||2  <  II  (QtPMS)(SV)  -  (QTPb)  ||2  .  (16) 

since  the  2*norm  is  preserved  under  orthogonal  transformations.  Now  let  y*  be  the 
N  vector  such  that 

SV  =  [£]  .  (17) 


Also,  let  us  partition  the  n-vector  STJ;  conformably  into  the  direct  sum  of  y  and  £. 
Then,  by  (14)  through  (17),  we  obtain  the  following  inequality 

(||RV  +  X?  -  f||  ♦  ||AC  -  hill  )'*  <  (||Ry*  +  Xz*  -  f|||  +  ||Az*  -  h||!  )'*.  (1 8) 


By  the  assertion  of  the  theorem  and  relation  (17),  however,  we  have 


Ry*  +  Xz*  -  f  =  0 . 


(19) 


So,  by  combining  relations  (18)  and  (19)  we  get  the  inequality 

||A;-h||2<||Az*-h||2, 

which  is  a  contradiction  since  z*  is  a  least  squares  solution  of  the  nonsquare 
system  Az  =  h.  This  completes  the  proof. 

Theorem  1  is  applicable  to  both  overdetermined  and  underdetermined  systems 
of  equations.  Also,  Theorem  1  is  applicable  to  both  full  rank  and  rank  deficient 
nonsquare  matrices.  If  M  has  full  rank,  then  the  matrix  A  has  full  rank.  Similarly,  if  M 
is  rank  deficient,  then  A  is  rank  deficient,  and  so  any  linear  least  squares  method 
chosen  for  the  solution  of  the  original  nonsquare  system  of  equations  Mx  =  b  can 
be  used  for  the  solution  of  the  smaller  nonsquare  system  Az  =  h  in  Theorem  1 . 
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In  computer  implementations  of  the  decomposition  in  Theorem  1 ,  we  compute 
the  orthogonal  matrix  Q  in  (12)  as  the  product  of  two  m  by  m  orthogonal  matrices 
Qi  and  Qz  so  that 


Qt-Q2Q{, 


(20) 


where  Qi  has  the  special  form 


(21) 


in  which  Iis  an  m  -  N  by  m  -  N  identity  matrix.  This  product  form  of  Q  allows  us  to 
use  the  N  by  N  orthogonal  matrix  Q  in  Qi  to  exploit  any  special  zero-nonzero 
structure  that  the  leading  block  A  might  have.  The  other  orthogonal  matrix  Q2  in  the 
product  form  is  chosen  so  that 

qI  (qTk)=  qJ  [QcA]  =  [0]  (22) 

where  R  is  an  N  by  N  upper  triangular  matrix  with  nonzero  diagonal  entries  and  0  is 
a  zero  matrix. 

The  product  form  of  Q  in  (20)  is  also  suitable  for  dealing  with  the  practical  case 
where  the  leading  block  A  in  PMS  is  a  singular  matrix.  This  is  done  as  follows.  Let  r 
denote  the  rank  of  the  N  by  N  singular  block  A.  Without  any  loss  of  generality, 
assume  that  the  leading  r  columns  of  A  are  the  linearly  independent  columns.  Then 
there  exists  an  N  by  N  orthogonal  matrix  Q  such  that 


in  which  R’  is  an  r  by  r  upper  triangular  matrix  with  nonzero  diagonal  entries  and 
the  0's  are  zero  matrices.  So,  by  combining  (8),  (21)  and  (23),  we  obtain  the 
following  2  by  2  block  matrix 

Q}PMS  .  [c-  D']  •  (24) 

The  N  -  r  by  r  zero  block  in  the  first  column  of  QTA  is  a  submatrix  of  the  block  C\ 
while  the  N  -  r  by  N  -  r  zero  block  in  the  second  column  of  QTA  is  a  submatrix  of  the 
block  D\  For  the  case  that  M  is  an  overdetermined  matrix  and  the  zero  block  in  the 
first  column  of  QTA  occupies  the  last  r  rows  of  block  C\  the  2  by  2  block  matrix  in 
(24)  takes  the  zero-nonzero  structure  shown  in  figure  1 . 
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Figure  1 .  Zero-nonzero  structure  of  Q{PMS  when  M  is  overdetermined. 

Our  next  task  is  to  zero  the  m  -  N  by  r  dark  shaded  nonzero  block  below  the 
upper  triangular  matrix  in  figure  1.  Let  K  be  the  2  by  1  block  matrix  defined  by 

K -["’•]  •  (25) 


Since  R'  is  an  upper  triangular  matrix  with  nonzero  diagonal  entries,  the  matrix  K 
has  full  column  rank,  and  so  there  exists  an  orthogonal  matrix  Q2  such  that 


(26) 


in  which  R  is  an  r  by  r  upper  triangular  matrix  with  nonzero  diagonal  entries  and  0 
is  m  -  r  by  r  zero  matrix.  Define  now 

ClTPb  =[“.],  (27) 

where  u'  and  are  r  and  m  -  r  vectors  respectively.  Also,  let 

GhMM.  <28> 

where  X,  A,  f  and  h  have  the  same  dimensions  as  B\  D\  u’  and  v'  respectively. 
Then,  by  (20),  (21)  and  (23)  through  (28),  it  follows  that  the  nonsquare  system  of 
equations  (15)  is  equivalent  to  the  following  nonsquare  system 


(29) 


in  which  y'  and  z'  are  r  and  m  -  r  vectors  respectively.  Figure  2  illustrates  the  zero* 
nonzero  structure  of  the  matrix  QTPMS  when  M  is  overdetermined. 
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Figure  2.  Zero-nonzero  structure  of  matrix  QTPMS  when  M  is  overdetermined. 

Note  that  the  N  -  r  by  N  -  r  zero  block  in  the  block  D*  in  Figure  1  remains  zero  in  the 
process  of  computing  the  block  A  in  figure  2  since  the  N  -  r  by  r  block  right  below 
the  upper  triangular  matrix  R'  in  figure  1  is  zero.  Theorem  1  is  now  directly 
applicable  to  the  case  where  the  leading  block  A  in  PMS  is  a  singular  matrix. 


4.0  EXPLOTING  THE  STRUCTURE  OF  LEADING  BLOCK  IN  PMS 

The  zero-nonzero  structure  of  the  leading  block  A  in  the  2  by  2  block  matrix 
PMS  is  central  in  the  effective  utilization  of  the  decomposition  in  Theorem  1 .  If  A  is 
a  dense  matrix,  then  Theorem  1  provides  no  advantage  over  other  methods  for 
solving  the  sparse  linear  least  squares  problem.  However,  when  A  is  a  large 
sparse  matrix  with  rich  structure,  the  benefits  gained  from  the  application  of 
Theorem  1  can  be  worthwhile. 

Consider  the  most  favourable  situation  where  the  leading  block  A  in  PMS  is  a 
block  diagonal  matrix.  Then  PMS  is  called  a  block  bordered  diagonal  matrix 
(Zhang,  Byrd  &  Schnabel,  1992).  The  following  two  observations  highlight  the  key 
advantages  in  using  a  bordered  block  diagonal  matrix  to  compute  a  least  squares 
solution  of  the  system  Mx  =  b.  First,  each  diagonal  block  of  A  can  be  orthogonalized 
independent  of  the  remaining  diagonal  blocks.  This  means  that  the  orthogonal 
matrix  Qi  in  (20)  can  be  computed  so  that  each  of  the  N  diagonal  blocks  of  A  is 
handled  by  a  different  processor  on  a  parallel  architecture  computer.  Second,  the 
matrix  R  in  (12)  and  R'  in  (23)  are  block  diagonal  matrices,  which  means  that  the 
sparsity  of  the  off-diagonal  part  of  the  block  diagonal  matrix  A  is  fully  preserved  at 
the  completion  of  orthogonalizations  in  (1 2)  and  (23). 

it  is  important  to  note  that  each  diagonal  block  of  the  block  diagonal  matrix  A  is 
a  frontal  matrix  in  M  if  the  off-diagonal  block  C  in  PMS  is  a  zero  matrix.  This  is  not 
hard  to  see  since  any  two  columns  of  A  taken  from  two  different  diagonal  blocks  will 
satisfy  equality  (7)  if  C  is  a  zero  matrix.  But  if  the  diagonal  blocks  of  A  are  frontal 
matrices,  then  by  the  graph-theoretic  interpretation  of  frontal  matrices  given  earlier 
there  must  exist  in  the  undirected  graph  of  the  symmetric  matrix  MTM  an 
independent  set  of  size  N.  However  large  independent  sets  are  generally  found  in 
graphs  that  are  quite  sparse.  Hence  for  the  case  where  MTM  is  assumed  to  be  a 
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dense  or  full  matrix,  the  diagonal  blocks  of  A  may  not  be  found  by  the  conventional 
methods  used  for  allocating  frontal  matrices. 

The  permutation  of  an  arbitrary  nonsquare  matrix  M  into  a  block  bordered 
diagonal  form  is  a  very  difficult  computational  problem.  Moreover,  the  existence  of 
large  block  diagonal  matrices  in  arbitrary  nonsquare  matrices  have  been  seldom 
reported.  Unlike  nonsquare  or  nonsymmetric  matrices,  block  bordered  diagonal 
forms  are  abundant  when  the  underlying  sparse  matrix  is  symmetric.  Industrial 
applications  in  which  block  bordered  diagonal  forms  are  extensively  utilized 
include  domain  decomposition,  VLSI  circuit  design,  structural  engineering,  and 
power  system  network  problems  (Zhang,  Byrd  &  Schnabel,  1992).  Also,  given  any 
sparse  structurally  symmetric  matrix  M,  the  work  in  Kevorkian  (1993)  gives  a  linear¬ 
time  algorithm  for  computing  permutation  matrices  P  and  S  so  that  PMS  has  a 
block  bordered  diagonal  form. 

Another  structural  form  of  A  that  retains  one  of  the  attractive  computational 
features  of  a  block  diagonal  matrix  is  the  block  triangular  form.  In  sharp  contrast  to 
block  bordered  diagonal  matrices,  2  by  2  block  matrices  in  which  the  leading  block 
is  a  block  triangular  matrix  are  encountered  in  real  industrial  and  government 
applications.  One  such  application  has  been  encountered  in  a  Navy  signal 
processing  problem  involving  the  prediction  of  bistatic  target  scattering  from 
monostatic  data  (Schenk,  1968;  Schenk,  1993;  Schenk  &  Benthien,  1989).  In  this 
particular  application,  the  Helmholtz  integral  equation  together  with  given 
boundary  conditions  are  discretized  to  arrive  at  the  following  algebraic  system 

Y  =  FX,  (30) 

in  which  Y  is  a  p  by  p  complex  symmetric  matrix  (scattering  function)  with  known 
diagonal  entries  (monostatic  data),  F  is  a  p  by  q  complex  matrix  (far-field 
propogator  function),  and  X  is  a  q  by  p  matrix  (surface  pressures).  The  objective  is 
to  use  the  monostatic  data  (diagonal  of  Y)  and  the  principle  of  reciprocity  (symmetry 
of  Y)  to  estimate  the  surface  pressures  (matrix  X).  This  is  done  next. 

Suppose  we  equate  the  diagonal  of  Y  with  the  diagonal  of  the  p  by  p  product 
matrix  FX  in  (30).  Then  we  get 


F(i.:)X(:,i)  =  Y(i,i)  i-1 . p(31) 

Next,  let  us  equate  the  (i,  j)  off-diagonal  element  of  the  symmetric  matrix  FX  with  its 
(j,  i)  element.  Then  we  have 


F(i,  :)X(:,  j)  =  F(j,  :)X(:,  i)  i  =  1 . p  -1 ;  j  =  i+1 ,...,  p  (32) 

Equations  (31)  and  (32)  together  form  a  system  consisting  of  p(p+1)/2  linear 
equations  and  pq  unknown  variables.  To  put  this  linear  system  of  equations  in 
matrix  form,  we  let  x  be  the  pq  vector 
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and  let  b  be  the  pq  vector 


(33) 

(34) 


where  0  Is  a  zero  column  vector.  Then,  by  combining  (31)  through  (34),  we  obtain 
the  following  nonsquare  system  of  equations 

Mx  =  b,  (35) 

in  which  M  is  a  p(p+1)/2  by  pq  complex  matrix.  For  the  special  case  where  p  =  37 
and  q  a  16,  M  is  a  703  by  592  complex  matrix  with  the  zero-nonzero  structure 
shown  in  figure  3.  This  plot  of  matrix  M  was  obtained  using  the  Mspy"  utility  in 
version  4  of  the  linear  algebra  package  Matlab  (Mathworks,  1990). 

Three  key  properties  characterize  the  sparse  linear  least  squares  problem  in 
(35).  These  are  as  follows: 

(a)  MTM  is  a  full  matrix, 

(b)  M  contains  q  nonzeros  in  first  p  rows  and  2q  nonzeros  in  remaining  rows, 

(c)  M  is  a  rank  deficient  matrix. 

By  property  (a),  all  three  methods  of  normal  equations,  sequential  row 
orthogonalization  and  multifrontal  QR  factorization  would  produce  poor  results  for 
this  signal  processing  application.  By  property  (b),  there  does  not  exist  in  M  a  small 
subset  of  rows  whose  deletion  will  render  the  matrix  MTM  sparse.  By  property  (c), 
the  bistatic  target  scattering  problem  is  an  inherently  difficult  problem  whose 
solution  may  require  the  use  of  singular  value  decomposition  at  some  stage  of  the 
computation  process.  This  feature  will  be  covered  in  more  detail  later  on. 

Exploitation  of  the  sparsity  structure  of  matrix  M  in  figure  3  has  led  us  to  a 
computer  implementation  that  generates  permutation  matrices  P  and  S  so  that  the 
N  by  N  leading  block  A  in  PMS  is  a  p  by  p  block  upper  triangular  matrix  with  the 
following  two  distinct  properties:  (1 )  all  p  diagonal  blocks  are  full  square  matrices, 
(2)  the  leading  p-q+1  diagonal  blocks  of  A  are  q  by  q  matrices  while  the  remaining 

q-1  diagonal  blocks  have  sizes  q-1,  q-2 . 2,  1  in  that  order.  As  an  immediate 

consequence  of  the  second  property  of  the  leading  block  A  in  PMS,  we  obtain 

N  =  pq-q(q-1)/2.  (36) 

Thus,  for  the  case  where  p  =  37  and  q  =  16,  we  have  N  =  472,  which  shows  that  a 
large  portion  of  the  original  703  by  592  matrix  M  is  a  block  upper  triangular  matrix. 
Figure  4  shows  the  zero-nonzero  structure  of  the  2  by  2  block  matrix  PMS. 
Consistent  with  the  definition  of  a  block  bordered  diagonal  matrix,  we  call  a  2  by  2 


13 


Figure  3 
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Unstructured  form  of  bistatic  target  scattering  problem. 
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Figure  4.  Structured  form  of  bistatic  target  scattering  problem. 
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block  matrix  PMS  a  block  bordered  triangular  matrix  if  the  leading  block  A  of  PMS 
is  block  triangular. 

Suppose  PMS  is  any  block  bordered  triangular  matrix  in  which  the  leading 
block  A  ■  [A|]  1  is  a  p  by  p  block  upper  triangular  matrix  with  square  diagonal  blocks. 
Then,  the  following  result  highlights  a  numerical  property  of  PMS  that  will  be 
frequently  used  in  subsequent  developments. 

Lemma  1 .  Let  Q;  be  an  orthogonal  matrix  such  that  Q|Aa  is  upper  triangular,  and 


r°i 


Then  the  N  by  N  matrix  QTA  is  upper  triangular. 

Proof.  The  proof  is  obtained  by  premultiplying  A  by  QT. 


5.0  IMPLEMENTATION  OF  THE  DECOMPOSITION  IN  THEOREM  1 

In  this  section,  we  use  the  decomposition  in  Theorem  1  to  give  detailed 
algorithms  to  solve  the  sparse  linear  least  squares  problem  for  the  case  where 
M'M  is  dense.  To  make  the  algorithms  directly  applicable  to  the  bistatic  target 
scattering  problem  (35),  we  assume  that  the  leading  block  A  in  the  2  by  2  block 
matrix  PMS  is  a  block  triangular  matrix  with  dense  diagonal  blocks.  All  algorithms 
will  be  presented  in  the  Algol-like  language  in  Aho,  Hopcroft  &  Ullman  (1976) 
adopted  earlier. 

Let  kj  denote  the  order  of  the  ith  diagonal  block  of  the  p  by  p  block  triangular 
matrix  A  =  [Ay]  in  PMS.  The  purpose  of  the  orthogonal  matrix  Qi  in  (20)  is  to  zero 
the  strictly  lower  triangular  parts  of  the  p  diagonal  blocks  of  A.  Thus,  the 
premultiplication  of  A  by  the  transpose  of  the  orthogonal  matrix  Qi  will  zero  all  but 
top  element  of  the  kj  -  j  +  1  vector  x  defined  by 

x  =  A8(j:  kj,  j)  i  =  1 , ... ,  p;  j  =  1 . kj  -1 .  (37) 

The  other  orthogonal  matrix  Q2  in  the  product  (20)  is  called  upon  to  zero  block  C’  in 
the  2  by  1  block  matrix  K  in  (25),  and  so  the  premultiplication  of  K  by  the  transpose 
of  Q2  will  zero  all  but  top  element  of  the  m  -  N  +1  vector  x  defined  by 

X  =  [c'C j)]  i-1 . N.  (38) 

The  vector  x  in  both  (37)  and  (38)  is  a  dense  subcolumn  in  the  block  bordered 
triangular  matrix  PMS. 
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Since  MTM  is  a  dense  or  full  matrix,  the  matrix  R  in  (15)  and  (29)  is  a  full  upper 
triangular  matrix,  and  so  there  are  two  ways  to  proceed  with  the  implementation  of 
Theorem  1.  First,  set  up  a  row-oriented  data  structure  for  a  full  upper  triangular 
matrix  R,  and  subsequently  use  the  sequential  row  orthogonalization  method  in 
(George  &  Heath,  1980)  to  compute  R  by  applying  Givens  rotations  to  rows  of 
[PMS  Pb]  one  at  a  time.  Second,  avoid  the  use  of  sparse  data  structure  since  all 

there  submatrices  R,  X  and  A  in  (15)  and  (29)  are  full  matrices,  in  this  work,  we 
choose  the  second  approach  for  the  implementation  of  Theorem  1  since  memory 
savings  brought  about  from  the  use  of  sparse  data  structure  is  not  substantial  for 
problems  where  MTM  is  a  dense  or  full  matrix. 

Let  RS  and  CS  be  two  single  arrays  so  that 

M(RS,  CS)  =  [  PMS  Pb  ]. 

a.oj,  let  IA  be  a  (p+1)  array  defined  by 

IA  =  [1, 1+k, . 1+2?  kj], 

in  which  the  leading  p  elements  are  the  pointers  to  the  1st  row  of  the  p  diagonal 
blocks  of  the  block  triangular  matrix  A  whereas  the  (p+1 )  th  element  is  the  pointer  to 
the  first  row  of  blocks  C  and  D  in  PMS  as  well  as  the  first  column  of  blocks  B  and  D. 
From  these  three  relations,  one  can  readily  allocate  all  diagonal  blocks  of  A,  blocks 
B,  C,  and  D  and  the  sub-vectors  u  and  v  of  Pb  as  given  in  (10). 

For  clarity  of  presentation,  we  first  give  an  implementation  of  Theorem  1  in 
which  we  assume  that  the  leading  block  A  is  nonsingular.  Next,  we  cover  the  more 
challenging  case  where  the  leading  block  A  is  assumed  to  be  singular. 

5.1  LEADING  BLOCK  A  IS  NONSINGULAR 

The  following  algorithm  implicitly  computes  the  two  orthogonal  matrices  Qi  and 
Q2  defined  in  relation  (20).  The  integers  a  and  z  in  procedure  q_en_q  designate 
the  first  and  last  rows  of  each  diagonal  block  of  A  considered  in  the  algorithm. 

procedure  q_en_q: 
begin 

comment  compute  Q  (and  so  Qi ) ; 
for  i  <—  1  until  p  do 
begin 

a  <-  IA(i) ; 
z «-  IA(i+1)-1; 
for  j «-  a  until  z  -1  do 
hshldr(RS(j:  z)) 

end  ; 

comment  compute  Q2 ; 
for  j  <—  1  until  N  do 

hshldr([RS(j),  RS(N+find(M(RS(N+1 :  m),  CS(j))  *0))]) 

end 
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The  procedure  hshldr(rows)  called  in  q_en_q  uses  Householder  reflections  to 
zero  all  but  the  top  element  of  the  column  vector  w  =  M(rows,  CS(j)).  If  rows  = 
RS(j:  z),  then  x  is  the  vector  given  in  (37).  In  the  same  way,  if  rows  =  [RS(j), 
RS(N+find(M(RS(N+1:  m),  CS(j))  *  0))],  then  x  designates  the  nonzero  part  of  the 
vector  given  in  (38).  The  function  "find"  is  a  Matlab  utility  for  finding  the  indices  of 
the  nonzero  elements  in  a  vector. 

procedure  hshldr(rows): 

begin 

x  <-  M(rows,  CS(j))  ; 
normx «-  ||  x  ((2  ; 

If  normx  >  0  then 
begin 

8  f-  x(1)  +  sign(x(1))*normx; 

v  [1 ;  x(2:|  x  |)  /  8 ) ; 

w  <-  -  (2/(vT*v))*vT*M(rows,  CS(j:n+1)) ; 

M(rows,  CS(j:n+1)) <-  M(rows,  CSQm+l))  +  v*w 

end 

end 

Given  x  =  M(rows,  j),  procedure  hshfdr(rows)  computes  a  Householder  vector  v 
so  that  the  first  element  of  the  vector  (I  -  2vvT/vTv)x  is  nonzero  and  all  other 
elements  are  zero.  The  Householder  vector  v  computed  in  hshldr(rows)  is  defined 
in  the  standard  way  (Lawson  &  Hanson,  1974)  as 

v  =  x  +  sign(x(1))||x||2e1 

where  ei  is  the  first  column  of  |x|  by  |xj  identity  matrix.  The  signum  function  "sign" 
ensures  that  |  v(1)  |  ■  |  x(1)  |  +  ||  x  Ife,  which  means  that  ||  v  H2  2s  ||  x  H2,  and  so  large 
relative  errors  in  the  coefficient  2/vTv  can  be  avoided  in  the  process  of  computing 
the  product  (I  -  2wT/vTv)x.  In  addition  to  this  standard  practice,  we  also  follow  a 
guideline  established  in  (Golub  &  Van  Loan,  1989)  to  normalize  the  vector  v  so  that 
v(1)  =  1. 


Since  x  =  M(rows,  CS(j)),  the  submatrix  M(rows,  CS(j:n+1))  can  be  written  in  the 
following  augmented  form 

M(rows,  CS(j:n+1))  =  [  x  M(rows,  CS(j+1  :n+1))  ] . 

Combining  this  with  the  last  two  lines  in  procedure  hshldr  we  obtain  the  equality 


M(rows,  CS(j:n+1))  =  (I  -  -^-)  [  x  M(rows,  CS(j+1  :n+1))  ] . 

Thus,  at  the  completion  of  procedure  hshldr,  the  vector  (I  -  2wT/vTv)x  forms  the  first 
column  of  the  submatrix  M(rows,  CS(j:n+1)).  Therefore,  if  x  is  the  vector  in  (37), 
then  all  the  elements  below  the  main  diagonal  in  the  jth  column  of  block  An  will  be 
zero  at  the  completion  of  procedure  hshldr(rows).  Similarly,  if  x  is  the  vector  given 
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in  (38),  then  all  the  elements  below  the  main  diagonal  in  the  jth  column  of  matrix 
PMS  will  be  zero.  Hence  at  the  completion  of  procedure  q_en_q  we  will  obtain  the 
following  three  identities. 


R  «  M(RS(1:N),  CS(1:N)), 

[  X  f )  »  M(RS(1  :N),  CS(N+1  :n+1)) , 

[A  h]  =  M(RS(N+1  :m),  CS(N+1  :n+1 )) . 

At  this  point,  we  are  in  position  to  compute  a  least  squares  solution  of  the  smaller 
nonsquare  system  Az  =  h  in  Theorem  1.  However,  before  we  do  this,  we  require 
the  following  result  pertaining  to  the  upper  triangular  matrix  R. 

Lemma  2.  Suppose  the  N  by  N  leading  block  A  in  the  block  bordered  triangular 
matrix  PMS  is  nonsingular.  Then  the  upper  triangular  matrix  R  computed  in 
procedure  q_en_q  has  nonzero  main  diagonal. 

Proof.  Since  the  block  A  has  full  rank,  the  N  by  N  leading  block  QTA  is  an  upper 
triangular  matrix  R'  with  a  nonzero  main  diagonal  at  the  completion  of  the  first  for 
loop  in  procedure  q_en_q.  Consider  now  any  call  to  hshldr(rows)  in  the  second  for 
loop  in  procedure  q_en_q.  Then  x  =  M(rows,  CS(j)>  consists  of  the  vector  given  in 
(38).  Thus  R’(j,  j)  is  the  element  at  the  top  of  vector  x  which  is  nonzero  since  R'  has 
a  nonzero  main  diagonal.  This  means  that  the  top  element  in  the  vector 
(I  -  2wT/vTv)x  is  also  nonzero.  But  by  the  construction  of  the  vector  x,  R(j,  j)  is  the 
element  at  the  top  of  the  vector  (I  -  2wT/vTv)x.  Hence,  the  upper  triangular  matrix  R 
has  a  nonzero  main  diagonal  at  the  completion  of  the  second  for  loop  in  procedure 
q_en_q.  This  completes  the  proof. 

By  Lemma  2,  the  condition  normx  >  0  is  satisfied  at  each  call  to  hshldr,  and  so 
for  the  case  where  the  leading  block  A  is  nonsingular,  the  line  "if  normx  >  0  then" 
in  procedure  houshldr  can  be  deleted. 

5.2  COMPUTING  A  LEAST  SQUARES  SOLUTION  OF  Az  =  h 

Choosing  a  method  for  computing  a  least  squares  solution  of  the  sparse 
nonsquare  system  Mx  =  b  strongly  depends  on  the  rank  of  the  matrix  M.  If  M  has  full 
rank,  then  the  block  A  has  full  rank,  and  so  one  can  obtain  a  least  squares  solution 
of  the  nonsquare  system  Az  =  h  either  by  QR  factorization  or  by  Cholesky 
factorization  of  the  system  of  normal  equations 

(ATA)z  =  (ATh).  (39) 

Thus,  if  z*  is  the  solution  of  the  triangular  system  resulting  from  the  QR  factorization 
or  the  solution  of  the  normal  equation  (39),  then  the  n  vector  x*  defined  in  Theorem 
1  is  a  least  squares  solution  of  the  original  nonsquare  system  of  equations  Mx  =  b. 

In  the  bistatic  target  scattering  application,  the  overdetermined  matrix  M  is  rank 
deficient  and  so  the  block  A  is  rank  deficient  too.  Therefore,  the  solution  of  the 
system  of  equations  Az  =  h  by  QR  factorization  will  break  down  since  a  diagonal 
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element  in  the  triangular  system  will  be  zero.  Similarly,  the  use  of  the  system  of 
normal  equation  (39)  will  fail  since  the  matrix  ATA  is  singular  which  means  that  the 
Cholesky  factorization  will  break  down. 

In  sharp  contrast  to  the  QR  factorization,  normal  equations  and  other  linear  least 
squares  methods,  the  singular  value  decomposition  (SVD)  is  applicable  to  full  rank 
as  well  as  rank  deficient  matrices.  In  view  of  this  practical  consideration,  the  SVD  is 
our  choice  for  the  solution  of  the  nonsquare  system  Az  =  h. 

Given  any  m  by  n  nonsquare  matrix  M  with  rank  r,  there  exists  an  m  by  m 
orthogonal  matrix  U  and  an  n  by  n  orthogonal  matrix  V  so  that  UTMV  is  a  diagonal 
matrix  £  with  r  positive  diagonal  elements  in  decreasing  order  and  n  -  r  zeros 
(Forsyth  &  Moler,  1967).  Thus,  if  ci  through  an  denote  the  diagonal  elements  of  £, 
then  we  have 

a,  £  o2  £ ...  2:  cr  >  or+1  =  ...  =  on  =  0  . 

The  numbers  a,  through  an  are  called  the  singular  values  of  M. 

Now  let  U  and  V  be  orthogonal  matrices  so  that  UTAV  is  a  diagonal  matrix  £. 
Also,  let  r  denote  the  rank  of  A.  Then  the  nonsquare  system  of  equations  Az  =  h 
can  be  written  in  the  following  equivalent  form 

(UZVT)z  =  h.  (40) 

Now  since  the  block  A  is  a  matrix  with  rank  r,  the  leading  r  diagonal  elements  of  £ 
are  nonzero  whereas  the  remaining  n  -  r  elements  are  zero.  This  means  that  the 
system  of  equations  (40)  can  be  written  as 

U(:,  1  :r)£(1  :r,  1  :r)VT(1  :r,  :)z  =  h .  (41 ) 

Thus,  if  z*  denotes  the  solution  of  this  system  of  equations,  then  we  obtain 

z*  =  V(:,  1  :r)£(1  :r,  1  ir)'1  U(:,  1  :r)Th .  (42) 

The  vector  z*  is  a  least  squares  solution  of  the  nonsquare  system  Az  =  h. 

With  these  results,  the  implementation  of  the  decomposition  in  Theorem  1  takes 
the  following  algorithmic  form  when  the  leading  block  A  in  PMS  is  nonsingular. 
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procedure  ls_dec: 
begin 

q_en_q  ; 

use  SVD  to  compute  A  =  U*Z*VT  ; 
r  ♦-  rank(I) ; 

z*  V(:,  1  :r)*I(1  :r,  1  :ry'-U{:,  1  :r)T*h  ; 
y*  *-  R_1*(f  -  X*z*) ; 
x*  *-  s*[  y* ;  z*  ] 

end 


5.3  LEADING  BLOCK  A  IS  SINGULAR 

For  the  case  in  which  the  block  upper  triangular  matrix  A  is  singular,  our  primary 
objective  is  to  use  the  orthogonal  matrix  Ch  to  detect  the  columns  that  are  linearly 
independent  in  each  of  the  diagonal  blocks  of  A.  This  information  is  then  used  to 
update  arrays  RS  and  CS  so  that  the  premultiplication  of  matrix  M(RS,CS)  by  the 
transpose  of  the  orthogonal  matrix  Qi  gives  rise  to  the  2  by  2  block  matrix  given  in 
(24)  in  which  the  leading  block  R'  is  an  upper  triangular  matrix  with  nonzero 
diagonal  entries. 

For  the  detection  of  linearly  independent  columns,  we  use  a  Boolean  array  VC 
(Visited  Columns)  setting  VC(j)  =  1  if  and  only  if  the  jth  column  of  block  A  is  a 
linearly  independent  column  in  a  diagonal  block  of  A.  The  entire  algorithm  called 
ulsjdec  (Unrestricted  Least  Squares  Decomposition)  is  presented  below. 

procedure  uls_dec: 

begin 

VC  «-  zeros(1,  N)  ; 
q_and_q  ; 

use  SVD  to  compute  A  =  U*E*VT  ; 
r  <-  rank(E) ; 

z*  <-  V(:,  1  :r)*Z(1  :r,  1  :r)1*U(:.  1:r)T*h  ; 
y*  <-  R_1*(f-  X*z*) ; 
x*  <-  S*[  y* ;  z*  ] 

end 

Comparison  of  procedures  ls_dec  and  uls_dec  shows  that  the  key  difference 
between  the  two  procedures  is  the  replacement  of  procedure  q_en_q  by  q_and_q 
and  the  introduction  of  array  VC.  The  procedure  q_and_q  is  given  below. 
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procedure  q_and_q: 
begin 

comment  compute  Q  (and  so  Qi ) ; 
for  i 1  until  p  do 
begin 

a  4-  IA(i) ; 
z  4—  IA(i+1)  -1  ; 
root  4-  a ; 

for  j  4-  a  until  z  do 
hshldra(RS(root:  z)) 

end  ; 

comment  update  arrays  RS  and  CS  ; 

LI  4-find(VC  1) ; 

NN  4- 1  LI  | ; 

If  NN  <  N  then 
begin 

LD<-  find(VC  ==  0); 

RS  4—  ( RS(LI),  RS(N+1 :  m),  RS(LD)  ] ; 

CS(1  :N)  4-  [  CS(LI),  CS(LD)  ] ; 

N  4-NN 

end  ; 

comment  compute  Q2 ; 
for  j  4-  1  until  N  do 

hsh!drc([RS(j),  RS(N+find(M(RS(N+1 :  m),  CS(j))  *  0))]) 

end 

There  are  three  distinct  parts  in  procedure  q_and_q  (highlighted  by  the 
comment  lines).  The  first  and  third  parts  (which  compute  the  orthogonal  matrices 
Qi  and  Q2  respectively)  are  derived  from  procedure  q_en_q,  whereas  the  second 
part  in  procedure  q_and_q  concerns  the  update  of  the  arrays  CS  and  RS  using 
array  VC.  The  computation  of  the  array  VC  is  carried  out  in  procedure  hshldra 
called  in  the  first  part  of  q_and_q.  This  procedure  is  given  below. 

procedure  hshidra(rows): 

begin 

x  4-  M(rows,  CS(j))  ; 
normx  4-  ||  x  H2 ; 
if  normx  >  0  then 
begin 

VC(j)  4-  1; 
root  4-  root  +1 ; 

84-  x(1)  +  sign(x(1))*normx; 

v  4-  [1 ;  x(2:|  x  |)  /  8  ] ; 

w  4-  -  (2/(vT*v))*vT*M(rows,  CS(j‘.n+1)) ; 

M(rows,  CSG:n+1))  4-  M(rows,  CS(j:n+1))  +  v*w 

end 

end 
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Each  time  the  condition  normx  >  0  is  satisfied,  the  vector  x  is  nonzero,  which 
means  that  the  jth  column  of  the  ith  diagonal  block  of  the  block  upper  triangular 
matrix  A  is  linearly  independent.  Consequently,  VC(j)  is  set  to  1  and  the  integer  root 
is  incremented  by  1.  The  purpose  of  the  integer  root  is  to  insure  that  the 
triangularization  of  each  diagonal  block  of  A  is  properly  done.  Note  that  procedures 
hshlkdra  and  hshldr  are  the  same  if  the  first  two  lines  in  the  begin  block  of 
procedure  hshldra  are  deleted. 

At  the  completion  of  the  orthogonal  matrix  Qi,  the  computation  of  the  array  VC 
in  procedure  q_and_q  is  complete.  Subsequently  procedure  q_and_q  begins  with 
its  next  task  which  is  to  update  arrays  RS  and  CS  so  that  the  matrix  Q^RS.CS) 
has  the  2  by  2  block  form  given  in  (24).  To  do  the  update,  we  construct  an  array 
ILIC  of  pointers  to  the  linearly  independent  columns  detected  in  procedure  hshldra, 
and  let  NN  =  |  LI  |.  Thus,  if  NN  =  N,  then  we  concude  that  the  leading  block  A  in 
PMS  is  nonsingular,  and  so  we  skip  the  part  in  procedure  q_and_q  that  involves 
the  update  of  the  arrays  RS  and  CS. 

Suppose  NN  <  N.  Then,  the  leading  block  A  in  PMS  is  singular,  and  so  we 
proceed  with  the  update  of  the  arrays  RS  and  CS  in  procedure  q_and_q.  To  begin 
with,  we  construct  an  array  LD  of  pointers  to  the  columns  of  A  that  have  not  been 
marked  as  linearly  independent.  Now,  by  the  construction  of  procedure  hshldra,  the 
NN  by  NN  submatrix  M(RS(LI),  CS(LI))  in  M  is  an  upper  triangular  matrix  with 
nonzero  diagonal  entries  at  the  completion  of  the  first  part  in  procedure  q_and_q. 
Thus,  to  obtain  the  2  by  2  block  form  in  (24),  we  update  arrays  RS  and  CS  so  that 
rows  RS(LI)  and  columns  CS(U)  are  moved  to  the  front  of  arrays  RS  and  CS 
respectively.  As  for  the  remaining  rows  and  columns  of  block  A,  these  are  moved  to 
the  rear  of  array  RS  and  sub-array  CS(1:N)  respectively  so  that  the  matrix 
M(RS,  CS)  has  the  2  by  2  block  form  shown  in  figure  1  at  the  completion  of  the 
update.  It  should  be  noted  however  that  the  blank  parts  (zero  submatrices)  in 
blocks  C'  and  D'  shown  in  figure  1  may  contain  nonzero  entries  in  this  case  since 
the  triangularization  in  procedure  hshldra  is  confined  to  the  diagonal  blocks  of  the 
block  triangular  matrix  A.  The  advantages  for  executing  hshldra  in  this  way  will  be 
apparent  when  we  discuss  parallel  implementations  of  the  decomposition  in 
Theorem  1  later  on. 

The  purpose  of  the  orthogonal  matrix  Q2  computed  in  the  third  part  of  procedure 
q_and_q  is  to  zero  the  block  C'  given  in  (24).  Procedure  q_and_q  accomplishes 
this  task  by  calling  procedure  hshldrc  given  below. 

procedure  hshldrc(rows): 

begin 

x  <-  M(rows,  CS(j))  ; 
normx  <-  ||  x  ||2  ; 

8 «-  x(1)  +  sign(x(1))*normx; 

v «-  [1 x(2:|  x  |)  /  5  J ; 

w  <-  -  (2/(vT*v))*vT*M(rows,  CS(j:n+1)) ; 

M(rows,  CS(j:n+1))  <-  M(rows,  CSQ:n+1))  +  v*w 

end 
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6.0  SOLVING  THE  LINEAR  LEAST  SQUARES  PROBLEM 

USING  MATLAB 


The  Matlab  linear  algebra  package  (Mathworks,  1990)  provides  an  excellent 
computational  platform  for  numerical  comparisons.  There  are  at  least  four  distinct 
methods  to  compute  a  least  squares  solution  of  a  nonsquare  system  of  equations 
Mx  *  b  in  Matlab.  These  are  the  following: 

(I)  Normal  Equations  Method.  The  sparse  Matlab  implementation  of  the  normal 
equations  method  is  identical  to  the  algorithm  covered  earlier  in  section  2.  For  the 
bistatic  target  scattering  problem,  the  m  by  n  matrix  M  is  rank  deficient,  which 
means  that  the  n  by  n  matrix  MTM  is  singular  and  so  the  Cholesky  factorization  will 
break  down.  Also,  MTM  is  a  full  matrix  and  so  the  use  of  sparsity  data  structure  for 
solving  the  bistatic  target  scattering  problem  by  the  normal  equations  method  will 
be  disasterous  since  a  sparse  data  structure  will  be  used  to  handle  a  large  full 
matrix. 


(II)  Augmentation  Method.  If  we  split  the  system  of  normal  equations  (3)  into  the 
following  two  systems 


r  +  Mx  =  b 


MTr  =  0, 


then  we  arrive  at  the  following  2  by  2  block  matrix  system 


(43) 


proposed  in  (Hachtel,  1974)  for  solving  the  sparse  linear  least  squares  problem.  If 
the  identity  matrix  I  and  the  vector  r  in  (43)  are  replaced  by  al  and  (a-’r) 
respectively,  where  a  is  some  nonzero  constant,  then  we  have  the  form  used  in 
(Bjorck,  1967)  for  computing  linear  least  squares  solution  iteratively.  In  Matlab,  the 
2  by  2  block  matrix  in  (43)  is  ordered  by  the  minimum  degree  algorithm  and  then 
the  system  is  solved  by  elimination.  The  augmented  method  is  a  complicated  way 
of  forming  the  system  of  normal  equations,  and  so  this  algorithm  will  break  down  in 
the  process  of  solving  the  bistatic  target  scattering  problem  since  M  is  a  rank 
deficient  matrix. 


(III)  QR  Factorization.  Since  the  overdetermined  matrix  M  in  the  bistatic  target 
scattering  problem  is  rank  deficient,  the  upper  triangular  matrix  produced  by  the  QR 
factorization  of  matrix  M  will  have  a  zero  on  the  main  diagonal,  and  so  this  method 
will  also  break  down  in  solving  the  upper  triangular  system. 

(IV)  Singular  Value  Decomposition.  The  singular  value  decomposition  has 
been  the  only  means  in  Matlab  to  produce  reliable  least  squares  solutions  of  the 
overdetermined  system  of  equations  in  the  bistatic  target  scattering  application.  For 
this  important  practical  consideration,  we  have  opted  to  avoid  sparse  data 
structures  here  to  have  a  meaningful  comparison  basis  for  algorithms  ls_dec, 
uls_dec  and  method  (IV)  in  Matlab. 
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7.0  COMPUTING  PERMUTATION  MATRICES  P  AND  S 


To  promote  the  use  of  the  bistatic  target  scattering  overdetermined  system  of 
equations  (35)  for  comparison  purposes,  we  present  a  procedure  called  simsys(p, 
q)  that  simulates  the  overdetermined  system  (35)  for  any  positive  integers  p  and  q. 
This  simulated  version  of  the  bistatic  target  scattering  problem  retains  all  structural 
and  numerical  properties  of  the  original  problem.  This  procedure  is  as  follows. 

procedure  simsys(p,  q): 

begin 

comment  generate  m  by  n  zero  matrix  ; 
m  *-  p*(p+1  )/2  ; 
n  <-  p*q  ; 

M «-  zeros(m,  n) ; 

comment  compute  first  p  rows  of  0  -1  matrix  M  ; 
for  i  <—  1  until  p  do 

M(i,  1  +  (i  - 1  )*q:  i*q)  <-  ones(1 ,  q) ; 
comment  compute  remaining  rows  of  0  -1  matrix  M  ; 
for  i  <-  2  until  p  do 
begin 

a  <-  (i  -1)*(2*p  -  i)/2  ; 

P«-1+(i  -2)*q:(M)*q; 
for  j  i  until  p  do 
begin 

M(j  +  a,  (5)  ones(1,  q)  ; 

M(j,  1  +  (j  - 1  )*q:  j*q)  <-  ones(1 ,  q) 

end 

end  ; 

comment  compute  single  arrays  RS  and  CS  ; 
p_and_s  ; 

comment  select  nonzero  entries  of  matrix  M  and  vector  b  ; 
recast 

end 

For  the  case  p  =  37  and  q  =  16,  the  procedure  simsys  (excluding  procedures 
p_and_s  and  recast)  generates  the  703  by  592  overdetermined  matrix  M  shown  in 
figure  3. 

The  procedure  p_and_s  called  in  simsys  computes  row  and  column  sequences 
RS  and  CS  so  that 


M(RS,  CS)  =  [  PMS  Pb]. 

Procedure  p_and_s  also  computes  the  single  array  IA  used  in  procedures  q_en_q 
and  q_and_q  for  allocating  the  blocks  in  PMS  and  the  subvectors  in  Pb.  The  entire 
procedure  is  given  below. 
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procedure  p_and_s: 
begin 

comment  initialize  arrays  RS,  CS  and  IA  and  markers  VR,  VC; 
RS«-zeros(1,  m) ; 

CS«-1:n+1  ; 

VR  +-  zeros(1 ,  m) ; 

VC«-zeros(1.  n+1) ; 

comment  compute  parts  of  RS  and  CS  needed  to  construct  A; 

t<-p ; 

8<-1; 
k«-0; 
a«- 1; 

for  i «-  1  until  p  do 
begin 

IA(i) <-  a ; 

8«-8  +  t ; 

p  <-  min(x,  q)  -1  ; 
z<-a  +  p; 

RS(a:z)  <-  [i,  8:5  +  p  - 1 J; 

if  x  <  q  then  CS(a:z)  <-  k+1 :  k+  t  ; 

k  <-  i*q ; 

x  *-  x  -1 ; 

a<~z+1 

end  ; 

comment  compute  remaining  parts  of  RS  and  CS  ; 

IA(p+1)<-a ; 

N  «-z ; 

VR(RS(1:N))*-ones(1:N) ; 

VC(RS(1  :N))  <-  ones(1  :N) ; 

RS(N+1  :m)  *-  find(VR  ==  0); 

CS(N+1  :n+1)  f-  find(  VC  ==  0) 

end 

For  the  case  p  =  37  and  q  =  16,  the  application  of  procedure  p_and_s  to  the 
703  by  592  overdetermined  matrix  in  figure  3  produces  the  block  bordered 
triangular  matrix  shown  in  figure  4. 

The  procedure  recast  called  at  the  completion  of  procedure  p_and_s  in  simsys 
accomplishes  four  distinct  tasks.  These  are  as  follows: 

(a)  Fixing  rank  of  leading  block  A.  For  any  application  of  ls_dec,  procedure 
recast  modifies  the  main  diagonal  of  A  so  that  rank(A)  =  N.  Procedure  recast  does 
this  as  follows.  Since  m  =  p(p-1)/2,  n  =  pq  and  m  >  n,  we  have 

p>2q+1.  (44) 
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Also,  by  the  construction  of  the  0  -1  matrix  M  in  simsys,  each  row  in  the  first  p  rows 
of  M  has  exactly  q  Vs  while  each  of  the  remaining  m  -  p  rows  has  exactly  2q  1*s, 
and  so  by  (44)  we  obtain 

p  >  £|M(i,j)| ,  i  *1 . m. 

i*  i 

Thus,  if  the  main  diagonal  of  A  is  modified  so  that  each  diagonal  entry  is  set  to  p, 
then  A  becomes  a  positive  definite  matrix  with  rank(A)  =  N.  Procedure  recast 
accomplishes  this  task  by  using  the  relation 

M(RS(i),CS(i))  =  p ,  i  =  1 . N, 

since  A  ■  M(RS(1  :N),CS(1  :N)).  For  applications  requiring  a  singular  leading  block 
in  PMS,  we  modify  A  so  that  each  of  the  p-q+1  diagonal  blocks  of  A  has  exactly  one 
linearly  dependent  column.  All  remaining  q  -1  diagonal  blocks  of  A  are  converted 
to  positive  definite  matrices  by  setting  all  diagonal  elements  of  these  diagonal 
blocks  to  p. 

(b)  Modifying  block  D.  We  compute  in  the  bipartite  graph  of  block  D  a  maximal 
matching,  and  subsequently  we  set  the  locations  in  M  corresponding  to  the 
elements  of  the  maximal  matching  to  p.  This  task  creates  in  D  a  positive  definite 
matrix  of  size  equal  to  the  computed  maximal  matching. 

(c)  Converting  M  to  a  complex  matrix.  In  the  original  bistatic  target  scattering 
application,  the  matrix  is  complex  and,  so  to  create  a  similar  environment  in  our 
numerical  tests,  we  convert  M  to  a  complex  matrix  using  the  relation 

M  =  M  +  0.2  *  (V-1)  *  M . 

(d)  Choosing  right-hand  side  vector  b.  We  choose  b  so  that  component  i  of 
vector  b  is  equal  to  the  sum  of  the  absolute  values  of  the  nonzero  entries  in  row  i  of 
matrix  M.  For  the  case  where  A  is  nonsingular  (applications  of  procedure  ls_dec), 
this  task  combined  with  task  (b)  in  procedure  recast  produces  a  least  squares 
solution  x  in  which  each  component  is  equal  to  1 . 

7.1  COMPUTING  P  AND  S  FOR  THE  GENERAL  CASE 

Given  an  arbitray  nonsquare  sparse  matrix  M,  the  finding  of  permutation 
matrices  P  and  S  such  that  PMS  is  block  bordered  triangular  matrix  is  a 
computationally  very  difficult  problem  especially  when  certain  constraints  are 
imposed  on  the  size  of  the  leading  block  A  in  PMS.  In  the  special  case  where  M  is 
a  square  matrix,  the  permutation  of  M  to  block  bordered  triangular  form  is  closely 
related  to  the  NP-complete  problem  of  finding  a  minimum  feedback  vertex  set  in  the 
directed  graph  of  M. 

In  this  section  we  give  a  simple  greedy-type  linear  algorithm  to  put  a  nonsquare 
sparse  matrix  M  into  the  block  bordered  triangular  form  PMS.  The  method 
proceeds  as  follows.  First,  compute  in  the  bipartite  graph  of  matrx  M  a  maximal 
matching  H.  Let  k  =  |H|  and  assume  without  any  loss  of  generality  that  k  <  min(m,n). 
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Then  the  maximal  matching  H  gives  rise  to  a  permutation  matrix  P‘  so  that  PM  is  a 
2  by  2  block  matrix  in  which  the  leading  block  is  a  k  by  k  matrix  with  nonzero  main 
diagonal.  Let  G  ■  (V,  E)  be  the  directed  graph  of  the  leading  block  in  P'M.  Second, 
use  depth-first  search  (Aho,  Hopcroft  &  Ullman,  1976)  to  compute  the  strongly 
connected  components  of  G  in  topologically  sorted  order.  Let  p  be  the  number  of 
strongly  connected  components  in  G.  Then  at  the  completion  of  the  second  step  we 
have  permutation  matrices  P"  and  S  such  that  the  matrix  P"(P'M)S  is  a  2  by  2  block 
matrix  in  which  the  leading  block  is  a  p  by  p  block  triangular  matrix  of  size  k.  This 
completes  the  construction  of  the  block  bordered  triangular  matrix  PMS,  where 
P  »  P"P*. 


The  parameters  k  and  p  dictate  the  usefulness  of  the  block  bordered  triangular 
matrix  PMS  in  Theorem  1.  If  the  maximal  matching  H  computed  in  the  first  step  has 
a  small  cardinality,  then  the  block  triangular  part  in  PMS  will  be  small,  and  the 
advantages  of  the  decomposition  in  Theorem  1  will  be  limited.  Thus,  to  make  the 
greedy  algorithm  more  effective,  one  may  require  algorithms  that  produce  large 
maximal  matchings.  Similarly,  if  G  is  a  strongly  connected  graph  or  with  few 
strongly  connected  components,  then  the  leading  block  A  in  PMS  will  not  have  an 
interesting  structure  for  the  decomposition  in  Theorem  1  since  the  parameter  p  will 
be  small. 

For  the  case  where  G  contains  few  strongly  connected  components,  we  suggest 
the  following  step.  In  the  depth-first  search  tree  of  the  directed  graph  G,  there  are 
three  types  of  edges  (Aho,  Hopcroft  &  Ullman,  1976).  These  are  forward  edges, 
back  edges,  and  cross  edges.  By  construction,  each  back  edge  in  the  depth-first 
search  tree  gives  rise  to  a  cycle  in  G,  and  so  the  vertex  incident  with  the  largest 
number  of  back  edges  is  common  to  a  large  number  of  cycles  in  G.  Thus,  if  we 
remove  such  vertices  from  G,  then  the  resulting  graph  may  have  much  greater 
number  of  strongly  connected  components.  This  means  that  the  leading  block 
triangular  matrix  A  in  PMS  may  have  a  large  number  of  diagonal  blocks.  This 
approach  for  increasing  the  size  of  the  parameter  p  was  an  essential  step  in  our 
successful  structuring  of  the  bistatic  target  scattering  problem  into  the  block 
bordered  triangular  form  shown  in  figure  4. 

For  a  more  rigorous  treatment  of  this  problem,  more  research  effort  is  needed. 
We  hope  that  the  successful!  application  of  Theorem  1  to  a  real  practical  problem 
may  be  the  stimulus  for  further  research  in  this  important  area. 


8.0  NUMERICAL  RESULTS  AND  COMPARISONS 

The  linear  least  squares  algorithms  ls_dec  and  uls_dec  are  well-suited  for 
parallel  architecture  machines.  For  example,  the  computation  of  the  orthogonal 
matrix  Qi  in  procedures  q_en_q  and  q_and_q  can  be  readily  done  on  a  parallel 
machine  as  each  of  the  p  passes  of  the  for  loop  (used  for  computing  Qi)  is 
independent  of  the  other  p-1  passes.  Also,  the  computation  of  the  orthogonal  matrix 
Q2  can  be  done  in  such  a  way  that  the  parallel  architecture  of  a  machine  is  fully 
explored.  The  parallel  features  of  algorithms  ls_dec  and  uls_dec  will  be  the  topic  of 
further  work  in  this  area.  Here,  we  present  numerical  results  and  comparisons  to 
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demonstrate  the  effectiveness  and  accuracy  of  these  two  algorithms  on 
conventional  machines. 

Tables  1  and  2  summarize  the  results  obtained  from  the  application  of 
algorithms  ls_dec,  uls_dec  and  the  singular  value  decomposition  in  Matlab  to  five 
instances  of  the  bistatic  target  scattering  problem.  Column  1  in  these  tables  gives 
the  parameters  p  and  q  used  in  procedure  simsys  to  generate  the  m  by  n 
overdetermined  matrix  M.  Column  2  includes  the  size  of  matrix  M  in  each  of  the  five 
examples  while  column  3  gives  the  size  N  of  the  nonsingular  leading  block  A  in  the 
structured  matrix  PMS.  Columns  4  and  5  give  the  number  of  floating-point 
operations  (flops)  required  to  compute  a  least  squares  solution  while  columns  6 
and  7  give  a  measure  of  the  accuracy  of  the  computed  results. 

The  results  in  tables  1  and  2  reflect  the  effectiveness  of  the  decomposition  result 
in  Theorem  1  to  compute  a  least  squares  solution  of  an  overdetermined  rank 
deficient  system  using  singular  value  decomposition.  The  improvements  on  a 
parallel  architecture  machine  are  expected  to  be  more  substantial. 

Table  1 

Applications  of  ls_dec  and  Matlab  to  the  bistatic  target  scattering  problem 


(P.  q) 

m  by  n 

N 

flops 

(106) 

||Mx  -  b||2 
(io-12) 

ls_dec 

Matlab 

ls_dec 

Matlab 

(26.  10) 

351  by  240 

215 

73.86 

700.96 

2.44 

13.14 

(30.  13) 

465  by  390 

312 

185.87 

2057.60 

5.67 

18.62 

(37.16) 

703  by  592 

472 

630.98 

6764.70 

13.36 

21.57 

(42.  20) 

903  by  840 

650 

1413.20 

16958.00 

21.75 

61.56 

(44.  22) 

990  by  968 

737 

1897.10 

23350.00 

27.12 

62.74 
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Table  2 


Applications  of  uls_dec  and  Matlab  to  the  bistatic  target  scattering  problem 


(p.  q) 

m  by  n 

N 

flops 

(106) 

||Mx  -  b||2 
(10-'2) 

ls_dec 

Matlab 

ls_dec  Matlab 

(26,  10) 

351  by  240 

198 

92.04 

731.99 

3.24  4.97 

(30, 13) 

465  by  390 

294 

220.68 

1995.10 

8.29  7.86 

(37,16) 

703  by  592 

450 

720.31 

6565.40 

19.83  18.48 

(42,  20) 

903  by  840 

627 

1578.60 

16663.00 

37.35  22.46 

(44,  22) 

990  by  968 

714 

2109.70 

* 

45.53 

*  memory  limitations 
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